The Hazards in Segmentation of Handwritten Hindi Text
نویسندگان
چکیده
Optical Character Recognition (OCR) is a process to recognize the handwritten or printed scanned text with the help of a computer. Segmentation is very important stage of any text recognition system. The problems in segmentation can lead to decrease in segmentation rate and hence recognition rate. A good segmentation technique can improve the recognition rate. This paper deals with the hazards that occur in segmentation of handwritten Hindi text. We also explained the main reasons for some of these problems.
منابع مشابه
A Structural Approach for Segmentation of Handwritten Hindi Text
This paper makes an attempt to segment the handwritten Hindi words. The problem of segmentation is compounded by the possible presence of modifiers (matras) on all sides of the basic characters and due to the uncertainty introduced in the character shapes by way of different writing styles. We have devised a structural approach to capture the similarities and differences between structure class...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملLanguage identification for handwritten document images using a shape codebook
Language identification for handwritten document images is an open document analysis problem. In this paper, we propose a novel approach to language identification for documents containing mixture of handwritten and machine printed text using image descriptors constructed from a codebook of shape features. We encode local text structures using scale and rotation invariant codewords, each repres...
متن کاملDistinction between Machine Printed Text and Handwritten Text in a Document
In many documents machine printed& handwritten texts are intermixed .Optical Character Recognition (OCR) techniques are different for machine printed and handwritten text, so it is necessary to separate these text before giving input to the OCR. In this paper we are proposing methodology for Hindi language. This methodology is based on structural features of text. Experimental results on a data...
متن کاملRecognition of Handwritten Devanagari Words Using Neural Network
Handwritten Word Recognition is an important problem of Pattern Recognition. Online handwritten recognition system for Devanagari words is still in developing stage and becoming challenging due to the large complexity involvement. In India, more than 300 million people use Devanagari script for documentation. There has been a significant improvement in the research related to the recognition of...
متن کامل